Overview

Dataset Statistics

Number of Variables 9
Number of Rows 8.9571e+06
Missing Cells 32
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 5.0 GB
Average Row Size in Memory 594.4 B
Variable Types
  • Categorical: 9

Dataset Insights

tconst has a high cardinality: 8957051 distinct values High Cardinality
primaryTitle has a high cardinality: 4128978 distinct values High Cardinality
originalTitle has a high cardinality: 4148962 distinct values High Cardinality
startYear has a high cardinality: 151 distinct values High Cardinality
endYear has a high cardinality: 98 distinct values High Cardinality
runtimeMinutes has a high cardinality: 870 distinct values High Cardinality
genres has a high cardinality: 2314 distinct values High Cardinality
tconst has all distinct values Unique

Variables


tconst

categorical

Approximate Distinct Count 8957051
Approximate Unique (%) 100.0%
Missing 0
Missing (%) 0.0%
Memory Size 635.3 MB

Length

Mean 9.3695
Standard Deviation 0.4827
Median 9
Minimum 9
Maximum 10

Sample

1st row tt0000001
2nd row tt0000002
3rd row tt0000003
4th row tt0000004
5th row tt0000005

Letter

Count 17914102
Lowercase Letter 17914102
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 66009227
  • tconst contains many words: 8957051 words

titleType

categorical

Approximate Distinct Count 11
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 625.1 MB
  • The largest value (tvEpisode) is over 7.72 times larger than the second largest value (short)

Length

Mean 8.1777
Standard Deviation 1.6044
Median 9
Minimum 5
Maximum 12

Sample

1st row short
2nd row short
3rd row short
4th row short
5th row short

Letter

Count 73247840
Lowercase Letter 65991466
Space Separator 0
Uppercase Letter 7256374
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (tvEpisode, short) take over 50.0%
  • The largest value (tvepisode) is over 7.72 times larger than the second largest value (short)

primaryTitle

categorical

Approximate Distinct Count 4128978
Approximate Unique (%) 46.1%
Missing 11
Missing (%) 0.0%
Memory Size 743.6 MB

Length

Mean 19.6266
Standard Deviation 12.3258
Median 18
Minimum 1
Maximum 419

Sample

1st row Carmencita
2nd row Le clown et ses ch...
3rd row Pauvre Pierrot
4th row Un bon bock
5th row Blacksmith Scene

Letter

Count 129245667
Lowercase Letter 109093336
Space Separator 20418091
Uppercase Letter 20152331
Dash Punctuation 408221
Decimal Number 16914316
  • primaryTitle contains many words: 1120204 words
  • The largest value (episode) is over 4.62 times larger than the second largest value (dated)

originalTitle

categorical

Approximate Distinct Count 4148962
Approximate Unique (%) 46.3%
Missing 11
Missing (%) 0.0%
Memory Size 744.7 MB

Length

Mean 19.6266
Standard Deviation 12.3295
Median 18
Minimum 1
Maximum 419

Sample

1st row Carmencita
2nd row Le clown et ses ch...
3rd row Pauvre Pierrot
4th row Un bon bock
5th row Blacksmith Scene

Letter

Count 129227294
Lowercase Letter 109219987
Space Separator 20400458
Uppercase Letter 20007307
Dash Punctuation 414434
Decimal Number 16913789
  • originalTitle contains many words: 1151706 words
  • The largest value (episode) is over 4.62 times larger than the second largest value (dated)

isAdult

categorical

Approximate Distinct Count 9
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 563.8 MB
  • The largest value (0) is over 31.16 times larger than the second largest value (1)

Length

Mean 1
Standard Deviation 0.003026
Median 1
Minimum 1
Maximum 4

Sample

1st row 0
2nd row 0
3rd row 0
4th row 0
5th row 0

Letter

Count 1
Lowercase Letter 0
Space Separator 0
Uppercase Letter 1
Dash Punctuation 0
Decimal Number 8957077
  • The top 2 categories (0, 1) take over 50.0%
  • The largest value (0) is over 31.16 times larger than the second largest value (1)

startYear

categorical

Approximate Distinct Count 151
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 587.1 MB
  • The largest value (\N) is over 2.88 times larger than the second largest value (2018)

Length

Mean 3.7335
Standard Deviation 0.6797
Median 4
Minimum 2
Maximum 4

Sample

1st row 1894
2nd row 1892
3rd row 1892
4th row 1892
5th row 1893

Letter

Count 1193691
Lowercase Letter 0
Space Separator 0
Uppercase Letter 1193691
Dash Punctuation 0
Decimal Number 31053440
  • The largest value (n) is over 2.88 times larger than the second largest value (2018)

endYear

categorical

Approximate Distinct Count 98
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 572.5 MB
  • The largest value (\N) is over 1547.55 times larger than the second largest value (2017)

Length

Mean 2.0203
Standard Deviation 0.2007
Median 2
Minimum 2
Maximum 4

Sample

1st row \N
2nd row \N
3rd row \N
4th row \N
5th row \N

Letter

Count 8865940
Lowercase Letter 0
Space Separator 0
Uppercase Letter 8865940
Dash Punctuation 0
Decimal Number 364444
  • The top 2 categories (\N, 2017) take over 50.0%
  • The largest value (n) is over 1547.55 times larger than the second largest value (2017)

runtimeMinutes

categorical

Approximate Distinct Count 870
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 572.2 MB
  • The largest value (\N) is over 51.64 times larger than the second largest value (30)

Length

Mean 1.9839
Standard Deviation 0.2476
Median 2
Minimum 1
Maximum 23

Sample

1st row 1
2nd row 5
3rd row 4
4th row 12
5th row 1

Letter

Count 6549094
Lowercase Letter 82
Space Separator 0
Uppercase Letter 6549012
Dash Punctuation 9
Decimal Number 4671682
  • The top 2 categories (\N, 30) take over 50.0%
  • The largest value (n) is over 51.64 times larger than the second largest value (30)

genres

categorical

Approximate Distinct Count 2314
Approximate Unique (%) 0.0%
Missing 10
Missing (%) 0.0%
Memory Size 648.3 MB
  • The largest value (Drama) is over 1.63 times larger than the second largest value (Comedy)

Length

Mean 10.8931
Standard Deviation 6.4093
Median 12
Minimum 2
Maximum 32

Sample

1st row Documentary,Short
2nd row Animation,Short
3rd row Animation,Comedy,R...
4th row Animation,Short
5th row Comedy,Short

Letter

Count 89893178
Lowercase Letter 73183310
Space Separator 0
Uppercase Letter 16709868
Dash Punctuation 1968001
Decimal Number 0
  • genres contains many words: 2314 words
  • The largest value (drama) is over 1.63 times larger than the second largest value (comedy)

Interactions

Missing Values